challenge_3
Author

Young Soo Choi

Published

August 17, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read the data regarding eggs

Code
eggs <- read_csv("_data/eggs_tidy.csv")
eggs
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹​extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows

Briefly describe the data

This dataset shows that the price of eggs categorizied by their size from 2004 I think. Because this data has 6 columns, it cannot be easily recognized at once. So I will pivot it. The 4 columns about the types will be pivoted to “types” column.

Anticipate the End Result

In this ‘eggs’ dataset 2 of the variables are used to identify a case. So expected rows are 480 and columns are 4.

Code
#existing rows/cases
nrow(eggs)
[1] 120
Code
#existing columns/cases
ncol(eggs)
[1] 6
Code
#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)
[1] 480
Code
# expected columns 
2 + 2
[1] 4

Pivot the Data

Code
#pivot data

pivot_eggs<-pivot_longer(eggs,col=c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen), names_to="types", values_to="price")
pivot_eggs
# A tibble: 480 × 4
   month     year types                  price
   <chr>    <dbl> <chr>                  <dbl>
 1 January   2004 large_half_dozen        126 
 2 January   2004 large_dozen             230 
 3 January   2004 extra_large_half_dozen  132 
 4 January   2004 extra_large_dozen       230 
 5 February  2004 large_half_dozen        128.
 6 February  2004 large_dozen             226.
 7 February  2004 extra_large_half_dozen  134.
 8 February  2004 extra_large_dozen       230 
 9 March     2004 large_half_dozen        131 
10 March     2004 large_dozen             225 
# … with 470 more rows
# ℹ Use `print(n = ...)` to see more rows
Code
#change the order of columns
pivot_eggs<-pivot_eggs[c(2,1,3,4)]
pivot_eggs
# A tibble: 480 × 4
    year month    types                  price
   <dbl> <chr>    <chr>                  <dbl>
 1  2004 January  large_half_dozen        126 
 2  2004 January  large_dozen             230 
 3  2004 January  extra_large_half_dozen  132 
 4  2004 January  extra_large_dozen       230 
 5  2004 February large_half_dozen        128.
 6  2004 February large_dozen             226.
 7  2004 February extra_large_half_dozen  134.
 8  2004 February extra_large_dozen       230 
 9  2004 March    large_half_dozen        131 
10  2004 March    large_dozen             225 
# … with 470 more rows
# ℹ Use `print(n = ...)` to see more rows

Any additional comments?

It has changed to 480 rows and 4 columns dataset.